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that many of the results available for the independent case 
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Tee RODUCTION 


A prediction interval is a random interval that contains 
the value of a future observation or some function of 
future observations and whose end points are functions of 
previous Sample values. Such an interval provides an 
indication of the uncertainty in the future observations. 
More specifically, a 100r percent prediction interval for 
the value of a future Sample is an interval that is based 
On a previous sample and encloses the future observations 
with probability r, independent of the values of the 
distribution parameters, such as the mean or the standard 
deviation. A prediction interval needs to be distinguished 
both from a confidence interval and a tolerance interval; 

a confidence interval encloses the value of an unknown 
parameter and a tolerance interval is an interval within 
Which a specified proportion of the population values will 
lie with a specified probability. 

In many practical problems, it would be of interest to 
construct a prediction interval for the values of the next 
k sample values from a population. For example, if only 
one machine is available for testing and we must perform 
trials sequentially, a prediction interval could provide 
helpful information about the total time needed to complete 
the experiment or perhaps the number of trials it would be 


possible to perform. Another application of prediction 





intervals is in forecasting before a planned experiment is 
completed. In an experiment where each observation is 
expensive or where they can be made only infrequently, 
prediction intervals may be helpful in a ee a decision 
on the profitability of continuing the experiment at inter- 
mediate points in the experiment. For example, when the 
experiment concerns a phySical input or output, preliminary 
estimates of the ultimate amount of needed input material 
or of the ultimate storage needed for the output might be 
helpful. In other situations where the random variable is 
the "time until occurrence of an event," and where physical 
limitations prevent the concurrent running of all planned 
trials, prediction intervals might provide helpful infor- 
mation concerning the total time until completion of the 
planned experiment. Prediction intervals are also of 
frequent interest to a typical consumer of one or a small 
number of units of a given product. Such an individual is 
generally more directly concerned with the future performance 
of his specific sample than in the process from which the 
sample had been selected. A prediction interval to contain 
each of the values of the sample would then provide him 
with an interval within which he may expect the performance 
of all his units to be located with a high probability. 
Based upon his experience with a previous sample of 10 siaedobe 
bulbs, a consumer might wish to construct an interval which 
would have a high probability of including the performance 


values of each of three additional bulbs. 








In this thesis we derive prediction intervals for one 
future sample observation as well as simultaneous intervals 
for a specified number of future sample observations when 
the samples are correlated. These results are obtained as 
extensions of results due to Hahn [5]. He derived similar 
intervals for the case where the samples are independent and 
identically distributed as renee): 

fe chapcer III it is shown that Hahn's prediction inter- 


val for the standard deviation of a single future Sample is 


es ete teenie eel eT 





valid even in the case where the sample values are correlated 
a 


and have a multivariate normal distribution with mean vector 


u = (uwyuous---su)' and covariance matrix V having the following 


structure: 


v = (0H + Ht) + aC TE - E) Gut) 
nxn nx nx nxn nxn 
hy hy ny ee e«© e ee @ hy 
h lal h « e oe @® @ h 
inenenn Hy = é e e 2 
mn h low h of ey ie ey cs h 
3 3 3 3 
ha 2, se =) «aes a 


Ht is the transpose of H , h, (i=1,2,3,.-..n) and 
nxn nn 


a are positive constants, I is an nxn identity matrix, 
nxn 





and E is an nxn matrix all of whose elements are 


nm 
unity. 


Simultaneous prediction intervals for the standard 
deviations of k future samples are also derived and examples 
illustrating the results are provided. 

A covariance matrix with the above structure occurs in 
the study of random effects models in analysis of variance. 
If samples are drawn from a normal distribution N(u,9°) 
and it is assumed that yu itself is_normally distributed 


as N(n,o *) then it can be shown that the sample values 


u 
have a multivariate normal distribution with mean vector 


n= (n,n,n,.-..,n)' and covariance matrix 
2 2 O 2 2 
+ s 
onto, oT O, o, 
2 Z 2 2 2 
“fp 
Oy, oto Oy, o 
x oa Fi cee 2 ; a 
a 2 5 2 Bue wee 2 
u u u 


It can be seen that the matrix V has the same structure 


as in (1.1) by letting h.=h.=...=h =0°+o 2 and a=o°, A 
5 Eee n u 
possible application of the results of this thesis is in 
mes tOllowine situation. From a lot containing a large 
number of guns n are selected at random. Each of these 


guns is then fired k times and the resulting miss distances 





from a target are measured. Based on the mean of the 
measured miss distances, a prediction interval for the 
miss distance for a randomly chosen gun may be predicted. 
Chapter IV deals with procedures for constructing a 
prediction interval to contain a single additional observa- 
tion and also with constructing asimultaneous prediction 
interval to contain all k additional future observations, 
when the samples are correlated and the covariance matrix 


has the structure as in equation (1.1). 
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II. SUMMARY OF KNOWN RESULTS 


A. DEFINITIONS AND NOTATIONS 


Let X 1 Ogee. eee Konan jJ=1,¢,3,...,n,;, be k+l] sets 


nae 
of random samples of size ns from aanemnaiwdi1 Stripe 1on 
N(1,0°). The No samples for i=0 are considered as the 

given sample and the remaining k sets are future Samples 


for which prediction intervals are needed. 


Let i 
Ne J i 
4 jel J 
and geet - (X,. - X e 
Bi Ni j=1 i) al 


where 1=0,1,2,3...,k and J=1,2,3,..-,n,. 


B. PREDICTION INTERVALS FOR THE STANDARD 
DEVIATIONS OF FUTURE SAMPLES 


It is well known that aS ah and gy a Gia Acer ae 
have a Chi-square distribution with no,71 ana n,-1 degree 


Suet reecdomerespectively and they are mutually independent. 


Thus, S,°/S° follows an F distribtuion with n,-1 and 
n,~i degree of freedom respectively, 1=1,2,3,...,k. 


Therefore, a prediction interval to contain the standard 


deviation Sy of a single future sample of ny observations is 


dey 





pr{S.F(n,-1,n,-1;(1-r)/2)* < Sy < S F(n,-1.n,-1;(1#r)/2)4 = r 


(2.1) 


“ } 
aw) 


where F(n,-1,n -1;(i-r)/2) and Go per yea 3) 5 (m2) ) 
are lower and upper 100r% points of F distribution with 
n,-1 and nyt degree of freedom respectively. 
A two-sided 100r% prediction interval to contain the 
standard deviation Sy of a single future Sample of size n. 


1 


is 
1 1 
(1 5 —_ ee +5 
(Eien — 1; m)72) So: F(n, 1,n, 1; (1tr)/2) S,) 2.2) 
Toe Obtain a Simultaneous interval to contain the 


standard deviations of k future samples assume that 


n,=m, iSleey os. 5K, and let 


g 2 
aL 
i ae 
O 
SG 
and min =a Wo (K,m-1,n -1) 
7S. 


The random variables W, (K,m-1,n,-1) and Wo(K,m-1,n,-1) are 
known as the studentized largest and studentized smallest 


Chi-square variates,.respectively, in the statistical 


De CN - 
—_ 





literature and some tables [1] of the percentage point of 
their distributions are available. Let Dy(K ,m-1,n,-13r) 


and Pacem l,n =i 1-r) denote the upper 100r% and the lower 


ike 





Moi =r)emmoeints of the distribution of W, (K,m-1,n -1) 


and Wo(K,m-1,n.-1), respectively. 


Then 
Pr{max S e < D.CK,m-1.,n -1:r)S ey = 7 
4 ee ey “(0 ald O 
and 
leet Sagal olens, ef > p (K,m-1,n_-1,r)S a = j-r (22) 
4 cL ae es ie ie © ea O ‘ 


A simultaneous prediction interval to contain all the 


standard deviations 2a Da = 2, is given by 


ie 7 
(D,(K,m-1,n -13r)°*S_, Da(Keme=t nm -131-r) “Ss ) 


O 


C. PREDICTION INTERVALS FOR THE OBSERVATIONS IN A FUTURE 
SAMPLE 


Let Ky Xo ok ve KX be the values of n given samples 


a 
. 2 
from a normal distribution N(u,o-) and let Xt 1 ent2 ent 3? 


re be the values of k future independent observations 


ntk 
to be drawn from the same distribution. To get a prediction 
interval to contain a single additional observation X+1> 


we proceed as follows; 


Let 


the expected value of a is zero and the standard deviation 


of Z4 is | oe oa Ome a) 


IES: 





1\'s 
oer) 
It is easily seen that that the standardized variable 


j Up a 0 x al 


7 _ ee jig Xo 
1 
| 2 i 1, % 
G ea) o(1+>) 


and 
n 
=e Ue Ne 
S SS 2 (X, X.) 


are independent. Therefore, 


i 
I 
It 






2 
cae 





o“(n-1) 


follows a t distribution with n-1l degrees of freedom. 


devise 


<X tt Catlett n) 3 j=r 
(2.4) 


v Pr{X +t (n-1; (1-r)/2) (141/n) 8 <5 


where t(n-1;(1l-r)/2) and t(n-1;(l1+tr)/2) are lower and upper 
HOOr?, points of t-distribution with aide enees “oi freedom. 

Hence, a two-sided 100r%Z prediction interval to contain 
a single future observation Xx, nS 


sae 
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X # t(n-1)3(1+r)/2)(1+1/n) 7s, 


~ = ee i et = ee eo 


To determine a Simultaneous prediction interval to 


A 


contain all k future observations, first... let 
eae XL Se 3... ,k. 


Then, the expected value of Zs is zero and the variance of 


Ls is 


2 i 
fe) (ar 


ana it can be shown that cov(Z, ,Z,) = o“/n for all A sande 


i # j. The transformed variables 








t ff p< s — X Fe 
he a = —nFi 0 d= yee 3 we, K 
i 1,4 1,4 i 
one) Oo) 


have standard normal distributions. 
? 


Since (n-1)S 2 2 is independent of the Zs ana has a 
Chi-square distribution with n-1l degrees of freedom, each 


of the ratios 


ak ner O 


1% 
+o 
SS te, 





2 











follows a student's t-distribution with n-1 degrees of 


freedom and the qT, are correlated. The random variables 


Ty »To.T ,t, are jointly distributed according to the 


3 K 
multivariate generalization of the student's t-distribution 


with n-1 degrees of freedom. Tables of the percentage 
points of this distribution are given in [4]. If u is 


defined as the solution of the integral equation 


U u u 


ae f | ia , a Ga etCll eel 


; : peeenens Cle 
-~y = = ge oe ae oly 1 2 3 


K 


where f is the joint probability density 
Us La grccoly 
function of multivariate t-distribution with n-1 degrees 


of freedom, then: 


Foie Ua) SK XPD) “So 9+ +X u(t) Bate ro tA) St r 


The resulting 100r% simultaneous prediction interval to 


contain the values Xt 12 entorng32' ts oXnex of' all k additional 
observations is 

- Lys 

X, + mer) So i254) 


ee ‘ 


D. SOME THEOREMS USED IN DERIVING THE RESULTS IN THE THESIS 


e 


f@ihcorem 1. If X is distributed N(u,o°L), then X'AX/o 
Js distributed as = eae where A = utAy/20°, eugVOl i = archos 


Siege! and Only if A is idempotent. 


16 





Seeieoren 2. If xX 1s distributed N(y,V), then X'BX is 
erscrabuted as Galves ow where A = %u'Bu and k is the rank 


ieee tt and Only if BY is idempotent. 


* Theorem 3. If X is distributed N(y,V), then X'AX and 


Peeex are idependent if and Only if AVB = 0. 


* Theorem 4. If X is distributed N(p,V), then Y = C'tX and 


moAeare independent if and only if C'VA = 0. 


* Theorem 5. (Hogg and Craig theorem) 


Let Q = ice aaa oie where 23° ae and 


ag 
SE 
PS 
ee 
— 


ee el — a 


observations 0 of a random Sen LS of | size n from a normat 


eee ——— 


Wes 


distribution N(y,0° ). Let Q/a* be x She), let aa 


x(x.) ters... eek=1l, and let Q, be non-negative. Then 
the random variables Q, ,Q.,Q.,...,Q, are mutually stochas- 
2 as k Kel 
tically independent and, hence, @,/o* is e (Yr), = rs). 
i=l 
* Theorem 6. (Baldessari theorem). Let X bea 
nx1 
multivariate normal distribution with mean vector 4 # £4and 
nx 
covariance matrix ve elec, V) 5 and By »B,»Bo>--- »By 
be (nxn) idempotent matrices satisfying 
k 
PB, = I - = E 
j=0 nx nxn 
where I is the (nxn) identity matrix and E is 
nm nxn 


a (nxn) matrix all of whose elements are unity. Let 


meeeea pOsitive constant. Then, a necessary and sufficient 


ay 





Pondition for X'B,X/a, inte, s,.. sie, tO be IMmitually 
independent and have non-central Chi-square distribution 
with rs (r,s = rank of Bs j=0,1,2,...,k) degree of freedom 


is that the covariance matrix V has_.the following structure 


ES ae oD ee 
mn 2 mn mn nx nxn 

where H ‘ I and E are defined on (1.1). 
nxn nxn nxn 
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mit. PREDICTION INTERVALS TO CONTAIN THE STANDARD DEVIATIONS 
OF FUTURE SAMPLES -~- CORRELATED CASE 


Hahn [5] derived prediction intervals to contain the 
standard deviations of future samples of independent and 
identically distributed random variables from a normal 
Gistribution with unknown mean and unknown standard devia- 
tion. In this chapter we extend Hahn's results to the case 
where the samples are correlated and have a special type of 
mevariance Structure. 

Section A deals with the procedures Eon COnse ric Gmaeee 
prediction interval to contain the standard deviation of a 


Single future sample of size n. observations, based on a 


AL 
given sample of size ny 
Section B deals with the construction of simultaneous 
prediction intervals to contain the standard deviations S, 
m— | ,2,555..,% Of kK future samples of sizes n,- 
Numerical examples are given in Section C. 
A. PREDICTION INTERVAL TO CONTAIN THE STANDARD DEVIATION 
OF A SINGLE FUTURE SAMPLE | 
on. be the values of a given sample 


11? m3? ir the values of a future sample. It 


is required to construct a prediction interval for the 


Let Xq4> 


X 


Xpa2Agarres 2% 


and X 


standard deviation S, of the future sample based on the 


al 
Standard deviation So of the given sample. 


IL, 





Let 


n 
Zehr x. 
i jel 7 
and 
sal 
3 2 = 1 ec, = X,) where i = 0,1 
ioe | 7 
Let 
_ |; tae! 
X==— fF Es. where N=n_t+n 
N 520 j=l 13 O Hl 
and 
n 
al i 
Bete Ed (Kegon) 
i=o j=l 7 


denote the sample mean and variance of the combined sample 


of size Neen +n... 
O al 
Samee X, = (NX - ee 7 
partitioned as follows: 


1? the sum of squares NS? can be 


rs a 1 Se 
ae Ss |e 
ee eee FY OG eX, 4X, -X) 
4=0 j=l J 


izo je VJ 2 4 


Ns 


cnt 


a Ox 
i=0 j=l 


a5 oe 
ag7%q) + 14 (X4-X) 


_ 2 2 ae 72 
= 73) 9 ease +n (X,-X) tn, {(NX-n X,)/n,-X} 


N 
_ 2 re ioe =O 
—. NOSo + Ni84 + Ta oe) (eon 1) 
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Expressing the sum of squares in (3.1) as quadratic forms 


we can write the equation as 


t = t t t 
E'B,X = K'Bye + MIBDX + X'BX TO (3.2) 
Where B., B, and B, are idempotent matrices and 
Bo= I -NE and B, = I -n,~E, 1=0,1 
NxN NxN n,xn, n,xn, 
If eolaoaanee | One vilgetepsts?*° int are 


independent and have identical normal distributions with 
mean wu and variance om, then it is known that NS“/o°, 8, °/o° 


and nS. 7or have Chi-square distributions with N-1l, no-t 


and n,-1 degrees of freedom respectively and n N(X-X,)/n, 


is non-negative. Thus, Hogg and Craig's theorem (theorem 5) 
applies to equation (3.1). Therefore, the three quadratic 
forms on the right hand Side of (en) are mutually independent 
and n N(K-X,)°/n, 0° has a chi-square distribution with 1 
Segree of itreedom. It also follows that the matrix 53 in 
equation (3.2) is also idempotent. 

Now, suppose oe = (X91 Xqa2Xoge++ + oXon aX XqaeXpgess > In 
is a vector random variable having a multivariate normal 
distribution with mean uw = (ujUsus.-e oh)” and covariance 
matrix V which has the following Structure. 

NxN 


Get Ht o( T - EF ) (3.3) 


NxN NxN  NxN NxN  NxN 


nO) Ta 


ak 








where 


h ny ny Se ore h 

h Ny Ay 5 ee Ay 
nh 6 6= h h l= io3 ee h 

Ay Dy Dy + Areas « Dy 


L is an NXN identity matrix, E is an NXN matrix whose 
NXN NXN 
elements are all unity and a and or P25, .. ghee are 
positive constant. 

To obtain a prediction interval for Sy > we start with 
Pauatrons (3.1) and (3.2). Since the matrices B_, B.,.B 

3 =o a 

and B. are idempotent matrices and B, = £ B,.= I ‘is 
all the conditions of the Baldessari theorem (theorem 6) 
are now satisfied. Therefore, the three quadratic forms of 
(3.2) on the right hand side have central Chi-square distri- 


butions with nol; n.-l and 1 degree of freedom respectively 


a 
and are mutually independent. 


Thus, the random variable 


[>< 
[bo 
[> 


2 2 
X BX F (n,-1)S, iy eal 








R 
Q 


a(n,-1) a(n.-1) 


follows an F-distribution with n.-l1 and novi degrees of 


al 


Peeedom and we obtain a prediction interval for 4 iste 


Ze 





a 





2 
S 
Pr estes tn) /e) < = F(n,-1,n,-1;(1tr)/2)} =r 
O 


or equivalently, 


_q- * a ee bey 
Pr{S F(n,-1,n, i=. yee 5 < SPO G eats pred IE50)//2)))) ie 


(3.4) 


1 


where r is the chosen confidence coefficient and 

Bast sn -1 (lar) /2) and F(n,-1,n,-13 (1itr)/2) are the 
appropriate percentage points of the F distribution with 
n,-1 and no7t degrees of freedom. This yields the following 
two-sided 100r% prediction interval to contain the standard 


deviation Sy of ny future observations; 


i 4 ¥ 
Solemn, -1s Cry)" 3 F(n,-1,n,-13(1+r)/2) So) (3.5) 


This prediction interval for S, is exactly the same as the 


al: 
one obtained by Hahn [5] for the independent case. 


B. A SIMULTANEOUS PREDICTION INTERVAL TO CONTAIN THE 
STANDARD DEVIATION OF EACH OF k FUTURE SAMPLES 


Domine previous Section, Let BOWE nose 03°? 0ne 
be the values of a given random sample and let 


X X X X 


trees Inne aie 22723" 2 en 2 31°" 32°" 339-28 3n 


K2,*K2°%K37° + Kn, be the values of K sets of future 


samples from a normal distribution with unknown mean wu and 


11? a 


ees 


3 


unknown standard deviation o. 


a 


3 





Let 


Ne. 
X= = sf Xs 
Ns 4=1 
nN 
i 
2 1 cae, 
aa ee (x. eee), 
i ns j=l iJ a 


where i = 0,1,2,3,...,k and let 


K 
N= 2 ns 
i=0 
Aso let 
— oe: 
xX = N d L KX, 
4=0 j=1 79 
and 
K oe 
g* = < tee (X,, - X)° 
4=0 j=l J 
K 
be the mean and variance of the pooled sample of N= 2 
0 
observations. 
The sum of squares Ns? can be partitioned as 
ieee amen 
pe es —- =- 2 
(ne [eer xy) = ) >) (X, 4X, +X,-X) 
4=0 j=. 4=0 j=l J 
K My _ie ae 
= (X, -X) + n, (X;-X) ] 
i=0 j=l 
Dee OR ego 2 
= ns, tn,S, *n5S5 +...4nS, ee (256) 
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It follows that (N-1)8°/0* has a Chi-square distribution 

with N-1 degrees of freedom and (n,-1)8,°/0°, Orn Ls Cac Sis e e hee 
have Chi-square distributions with n,~1 degrees of freedom 
respectively, and the last term of (3.6) is non-negative. 
Applying Hoge anda Craig theorem (theorem 5) we can conclude 
that the last term of (3.6) also has a Chi-square distribu- 
tion with K[(N-1)- ; (n,-1) = (N-1)-(N-(K+1))=K] degrees 

of freedom and cae the sums of squares on the right 

hand side of equation (3.6) are mutually independent. 
Expressing these sums of squares as quadratic forms we 


can write equation (3.6) as; 


XIBX = KIBX + XIBIX + X’ 


! ! 
BX + XUBK + BX... + MBX BX (3.7) 


=e 


where 


B = TNE and B, = I -n,~E, 4°0,1,2,...,K,K+1, 
2 . D. + oo 
Nx X n,m, n,m, 
are idempotent matrices (see theorem 1). 


Now, suppose X is a random vector having a multivariate 


Nxl 
feommatedistribution with mean u = (U,u,u,-.-.,u)' and 
Nxl 
covariance matrix V which has the form (3.3). 
NXN 


hiespentition of NS* Cieneeiecamartons (3.6) and (3.7) 
are valid for this case also. Thus, we know B and B,, 


1=0,1,2,3,...,K, are idempotent matrices and B= I -N UE 
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i= = I -n 455 ec eeetne conditions of Baldessarti 


n,xn, n, xn, 
theorem (theorem 6) are all satisfied. Therefore, Xe o 
has a Chi-square distribution with N-1 degrees of freedom, 
X'B, X/a, 1=0,1,2,3,---,K;, have Chi-square distributions 
with n,-1 degrees of freedom and X'By 4 Xf has a Chi-square 
distribution with K degrees of freedom. Further the kte 
sums of squares on the right hand side of (3.7) are mutually 


independent. Thus, each of the random variables 


2 2 2 
! q = = 
K'BX /X'BOX — (my-1)8y" f (Mo-V)S FH 
2 
a 0 a(n, -1) a(n -1) S 4 
where i = 1,2,3,..-,k, follows an F distribution with n,-1 
and eee degrees of freedom. 
Now, assume n, = Ny = N3 = ee =), =m. Then, 


e°/s,"; jee. 3,..-,h, has an F distribution with m-1 and 


bo 1 degrees of freedom. 


Define the random variables 


S 2 

ab 
W, (K m-1,n,-1) = max ——>5 

2 eS 
O 
and - 
ie 
Wo (K,m-1,n,-1) = we 5 2 MMe 8 5c ty 
O 





The distributions of W, (K,m-1,n.-1) and Wo(K,m-1,n -1) 

are known as the studentized largest and studentized 

smallest Chi-square distributions, respectively. The upper 
percentage point Dy(K,m-1,no-1l5r) of W, (K,m=2 sn ~1) and 

the lower percentage point D, (K,m-1,n.-131-r) of Wo (K,m-1,n,-1) 
were tabulated by Armitage, J.V. and Krishnaiah, P.R. and 

are available in [1]. 


inen , 


Pr{W, (K,m-1,n.-1) SS OD ee 


} 
S) 
= Pr{max —s < D,(K,m-1,n.-13r)} =r 


is 
e) 


and 


2 


Pr{max S } =r. 


2] 
; < Dy (K,m-1,n -13r)S. 


af 


Thus, an upper 100r% simultaneous prediction limit to 
exceed the Standard deviations of all k future samples 


each of size m is 
. ) 
SoDy (K m-1,n,-13r) (25) 
Similarly, a lower 100(1-r)% simultaneous prediction limit 


to be exceeded by the standard deviations of each k future 


samples of size m is 


cul 





+s 
SD, (K,m-1,n,-1,1-r) CEG) 


This result is also the same as the one Hahn [5] obtained 


for independent samples. 


C. NUMERICAL EXAMPLES 

Suppose a gun is selected at random and alg Kole. 6 
times and the resulting miss distances from a target are 
measured. Let So = 1.00 be the standard deviation of these 
observations. A prediction interval for the standard devia- 


itor of n. = 10 future attempts is desired. 


1 I 

Then, a two-sided 95% prediction interval to contain 
the standard deviation 54 for a single future sample of 10 
observations is obtained as follows; 

For n, = MCh wi 6 and r = 0.95, F(9,5;.975) = 6.68 and 
F(5,930.975) = 4.48 and S,F(9,5;0.975)% = (1.00)(6.68)* = 2.58 
§.F(5,930.975)* = (1.00) (4.48)7% = 0.472. 

Substituting the above in equation (3.5) the required 
Breawevionmeinterval for Sy is (0.472,2.584). Next, an upper 
95% simultaneous limit to exceed the standard deviation of 
all 3 future samples of size 10 is; 


Peeper 64 Keaeotend r = 0.95, D,,(3,9,530.95) = 6.41 


© 
and Dy(329550-95) #8 = (6.41)%(1.00) = 2.762 (see table 28 


pace 41 [1]). 
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HV. PREDICTION INTERVALS FOR THE ADDITIONAL 
OBSERVATIONS IN A FUTURE SAMPLE — 
CORRELATED CASE 
A prediction interval to contain a single future 
observation and a simultaneous interval to contain each 
of k additional observations of a random sample from a 
normal distribution with mean uw and variance 5° were obtained 
by Hahn [5]. In this chapter we extend these results to the 
case where the samples are correlated and the covariance 
matrix has the form defined in (3.3). In section A a 
prediction interval to contain a Simele additional observa- 
tion based on correlated observations is obtained, and 
section B deals with the construc&gion of simultaneous 
prediction intervals to contain k additional correlated 


observations. Numerical examples are given in section C. 


A. A PREDICTION INTERVAL FOR A SINGLE FUTURE OBSERVATION 

Let Xq Xo sXa5--- Xp, be independent and have identical 
normal distribution with unknown mean wu and unknown 
standard deviation o. It is required to construct a 
prediction interval for an additional observation Xt] 
based on the given n samples. 


Let 


an, 





Since 


the sum of squares (n+1)S* | 


n 
2 1 — (2 
Ss ——s (Ck. = X ) 
n n al 4 n 
= , nti 
xX = —— fF xX 
ntl nti si, i 
n+l 
2 ee = 2 
Sati = ner 2. (%y 7 Spay? 
{=1 
= YY ee Lg a Si 


O 
(nt1)S 4 


can be partitioned as follows: 


i 
nti n 
= 2 = 2 = 2 
meme x. Xo) =e) Ut (XK) Cy 
1=1 ie ied $=] iu) @kalt nt? “ntl 
rs Ww: 
_ ee 2 =P o: v4 
: x (XX AK " Orne 
j=] D, ——— [a ; — —— a 
y ne > mk ~Xn) (Xn- & way, +H Kn~ Knee) 
a) eae act eee ee ae 
eo oe feat abl ~~ n jetsdlichnlasei gm lis | ajo) 
<, ay Fe 
ae 2 A ned M2 A one x é 
n aged: ight 
Vv 
= nso ex a 


oe ot, il i 9 | all) 


Expressing the sum of squares in (4.1) as quadratic forms 


we get 


x ; (4.2) 





where ee and B, are idempotent matrices and 


B= I - (ntl)“E and B. = I -n"E. 
(n+1)x(nt1) (n+1)x(n+1) nxn nx 


: and nS_*/o° have Chi-square distribu- 


| 2 
Since (nt1)S_ 44/9 
tions with n and n-l degrees of freedom respectively and 


n(X i X)°/ntl is non-negative, Hogg and Craig's theorem 








(theorem 5) applies to equation (4.1). Therefore, the two 
guadratic forms on the right hand side of (4.1) are mutually 
independent and n(X 4, ~ X_)°/(nt1)0° has a Chi-square 
Gistribution with 1 degree of freedom. It also follows 


that the matrix B, in equation (4.2) is also idempotent. 


2 


“ Now, suppose X is a vector random variable having 
Cat al 
@ multivariate normal distribution with mean u = (u,uU,u,-..- u)' 
(nt1)x1 
and covariance matrix V which has the following 
Gagaleec(nit.) 
soructure. 


= = H + &H'°) form LOU =lCUréCY (43) 
(n+1)x(nt1) (nt1)x(nt+1) (n+1)x(nt1) = (n+1)x(nt1) (nt1)x(n+1) 


where 
my We I cae h, 
ho Aso Ons 9) Ay 
isl = igi h h ial 
(n+1)x(nt1) 3 5 3 3 
h h ld h 3 


hak Meee nt 


eal 










LIne@aa . 


\J x ma C é poe 





a \ ‘ 
Pa cs get ty ee pick 


1s al Clave eae ruroks = Cc 
_ = \ _ 


; =~ ™ 





iets the transpose of H, a and h ESI 2s, See oo) eine 


a ed 
positive constant, I is an (nt1)x(nt+l) identity matrix and 
E is an (n+1)x(n+l) matrix whose elements are all unity. 
Since the matrices B,, B,, and B, in equation (4.2) are 


idempotent and 


B = = uy ~ a E 2 
(atl x(n) om B (nt Yx(nt1) ~ 42) Ont) X(nt1) 
we may apply the Baldessari theorem (Theorem 6) to equation 
CW. 2) to conclude that the two quadratic forms on the right 
hand side of the equations have central chi-square distri- 


bution with n-l and 1 degree of freedom-respectively and 
ieee. ae 


are mutually independent. 


NE Rl ee 


Thus, the random variable 





t i ig! _¥ 2 7 2 
O OL O aay 


has an F distribution with 1 and n-l degree of freedom. 


We obtain a prediction interval for eT as 


PréF(1,n-13(1-r)/2) < (BX KR) /8,2 < PO ne] (1tr)/2)} = 7 


2 





or equivalently 
Pr{X ts. (1 ct 2) *F(1,n-13 (1-r/2)" Oa 
< Hts (1 +4) (1 n-1; (14r)/2) 9 =r 


where r is the chosen confidence coefficient and 
Metenm—is(iltr)/2) and F(1,n-1;(itr)/2) are the appropriate 
percentage points of the F distribution with 1 and n-1l 
degree of freedom. 


Now recall that F(1,n-1; (1-r)/2)* = ¢(n-1;(1-r)/2) 
and | F(1 .n-1; (14r)/2)? = ¢(n-1; (1+r)/2) 


This yields the following two-sided prediction interval 


to contain the additional observation Xt: 


(H#S, (1. + F) A (n-2; 1-1) /2), KS (1 +5) fe (neds (4r)/2)) (44) 





Eee OlMUEPANEOUS PREDICTION INTERVALS FOR k 
FUTURE OBSERVATIONS 


Let X, sX5sXas-- 


ee nee nt 32) ent? the values of k future observations. 


We assume that the sample observations Xy Xo sXga--6 aks 


»5%. be the values of a given sample and 
rl a 


X 


xX x are correlated and have a multivariate 


Ge? nte? ntk 
Mormal distribution with mean uw = (U,u,u,...,u)' and 


ae 


covariance matrix V which has the form (4.3). 


515) 





iamorder GO Construct a Simultaneous prediction interval 


Tor Bat? nt2?*nt+32* °°? n+k we first establish that 
Ss (x, -%,)° ns* 
(i) Lf Chi-square distribution 
i=] 


with n=l degree of freedom, 


(ii) the vector variable Z (Zy 225 sZn5-0+5%))'5 where 
Z. =X ~X , has a multivariate normal distribu- 
a ae n 
VlOn and 


(iit) the vector variable Z and nS* are Statistically 


independent. 
- > 
i; nS = f (X, - X,) is expressed as a quadratic form 
i=] 
X'BX, where X = (Xy 9X5 oXa5--- Xi oXi yy Xypooe s+ kay)! 


Maetina Necessary and sufficient condition for X'BX to have 
a Chi-square distribution is that BV is idempotent (see 
Theorem 2). 

To show that BV is idempotent, let the matrices H, H', 


and E (4.3) and the matrix V be partitioned as follows: 
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Bea Mota Mata --- ne | Oat 
Dek Pot Bote + Batic | Pate nt 


ee 














and 
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= 5 (Hy+Hy') + a(Z)-E)) 


= 
Ss! 
O 
8 
@ 
< 
Ie 


ngik u 


We know 


and 





ol 





In equation (4.5), BV, and B,V, can be simplified as 
follows: 
Ee Green or ee CHetH ') + a(1_-E_)} 
a ee el al =] il al 
ees Tose lee On. — FH es alee 
Clee cant =) il en-l—1l 2n-1-1 
a a 
- nE.t, + AE Ey? 
= Jt 1 1 1 = = a gees t 
q aH, + HH)! + ol, - OE, - shEy ~ ony 
a 
- FE, + pPEy) 
= Al eds _ a 
on de oe “Say 
nN 
Where a = 2 h, 
4=1 
and B.v. = 2(1.-n71e. ){%(H.tH.') + o(0-E..)} 
=e G1 =i S 20 ee = 
= lex oy 8 a _ 1 ' O. 
q Hs + %H2' - aE, - spE Hs - spE Hs! + AE, E>) 
ees L 1 oe ote! Ny + a 
a + YH, AES - BaEo ~ opilg' + ANE») 


ee _ 2 
Do (Hh Ant ) 
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Thus 








BV = 
and 
(BV) (BV) 
2 
“eh? ee Ea) 
a (4.6) 
But 
ae =) 1 a 2 
(B,V,) = {(iI,-n E,) + 5q(H)- FE, )} 
= (Iy-n"7E,)° + (Zy-n "Ey gg(Hy- FEL) 


1 a -1 sl a 2 
te Oe eC ae) tn) 


oa —l 


ag 





and 


085 gy 2) UE le 


ty ar ilk eas 
Ty - mE. + oq°8y- wEa7 we hy * SELEY? 


n 


ul 


a 1 a 
oie By Te es el ) 








ae a ee ee 
Ii - 7 Et ogBy n&y~n£i 7-3 nE,) 
fe ip a 
oa‘—l nl n —1 ne Ve 
a a 
* eA on Ma ~ yy 7 ny? 
alt ae! a 
Se Se een 21? 
1 IL a 
{(I, ~REy) + ogy -F Ey)} 
a 
1 1 a 
aq (CL, -F Ey) * agfHy ~A ELH 
a Ik a a 
~ ACL) - EVES - daq¢Hy - RE Eo} 
IL a a ac a 
20 tH. 7 A Ep Se oe oon Boma BO 
D 
an an an 
Pogo Pp . H. + E.} 
ne 2 2an =2 " 502 =2 
i a 
dq ‘Ho - nw Eo) 





Therefore 


jie eo aS H.-2f 


iL 
Ty ~ nEy taqti- wEn?) | aq Ho- FE) 


(BV) (BV) 





Since BV is idempotent, X'BX/a has a chi-square distribution 
with n-l degree of freedom. 


Next, Z = (25 525 9%a5+++ 5)! can be expressed as 


7, 
pa = (ont xX 
kx1 kx(ntk) (n*tk)xl1 
where Cr oi oi ease) 
eee nN -— — 
kn kxk 


Z has a multivariate normal distribution with mean C'y = 0 
and covariance matrix C'VC as shown below: 
cus (tt ob, 2) yw =u tnei) = 
kxn kxk (ktn)xl 


41 








742 


7a 
n nt2‘= 


[ 
Nol 


n ntk’— 


1x(ntk) 


aL —n =o) ma 
a { GatevE > (Fphgthy JE see +2 (Pst Paty E 
kxl kxl kxl 


+a( — E : I) - a <4 nt ty) 2 
kxn kxk -kxntk 
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-a 
Cay * By4 EB 
-a 
7 (s> + hy, JE al : 
= 5 Si ats a ( — EK 3 al ) Cary) 
: kxn kxk 
-a 
Crt Bat, 
Pxntk 
-a 
(a * Ay JE 
1/ ;-a zal 
C'vc = =| (——+h JE +a(— E, I ) 
ey} on nte 7 xn kxk 
7/-a 
1x(k+n ) 
7 
nxk 
x 
2 
a 
kxk | 
—-9 -n 16) 
UC Ca?) Ft Paya JE 
-a =n -a 
| Grease? a? + Gt Base JE 
—49 —n sl | 
(Se Ce) 7 |) JE 


phe 


ta(— E ee ee 
n° kxk kxk kxk 
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kxk kxk 

144 i ab de 

n cl n nN 

iL 14g al, a 

nN nN nN nN 
 n 

i i i ead 

n nN n nN 

1 


Therefore, Z~N(O, a( ae ieee) aa) 


Also, 2 = C'X and X'BX are statistically independent since 


(see Theorem 4) 


i 
ic 
ay 

| tA 


IQ 
is 
td 
it 
role 
1 
|e 
oy 
/ 
| 3 
+ 
Q 
69) 





a, 


ace 





-a 
——+ 
Se her? a 
-a 
(FAMay) E 
a a 
kxn 
-a 
Ga ee 2 
ixn 
-a 
—S4+ 
ori y+. NE 
-a 
ante NE 
1 CO 
in oe 
en ae kxn 


Thus, each Zs> de ae eee eenciiad Ly distributed 
with mean 0 and variance a(1 + 1/n) and is independent 


OL se, 


+2 





Let bee J=i,2,3598..,K, bDesthe standardized variables 


defined by 


the variables 


Tie ne = 6 
+ 
? ee ee jini ee aoe le 


al 
\| (a-2)s® Teas 
a(n-1) S(1 + ae 


are jointly distributed according to the multivariate 
Bemeratization of the Student t-distributioen withen-1 degree 


of freedom and correlation matrix = defined by 


1 ay ey. 
Glass & dts ne 
a ees mies 
Titik nti nti 
> = 
oh 21. 1 NS 
ntl seat igual 
ge Le ; 
ig ga ea 
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—_— = _——— = 


To find a two-sided 100r%Z simultaneous prediction 
interval to contain each of k additional observations, let 


U be such that 


U U U 
m= f “eeea, £ Tf Cengeat on tens 
“5 - ssc T, je aetego oe stay Ei Laey 
dtjdt,...dt,, 
ey) 
Then _ ie 
pea ea 
Eee ee andes ond -u < 2 ey} = 
s(1 +4)? 3(1 +4)% 
gl gl 


The resulting 100r% simultaneous prediction interval to 


contain the valves Kei nto negee oA nay of all k future 
observations is 

a 1.4 

aoe] + ae Soe (4.9) 


For selected values of r, the values of U to satisfy the 
equation (4.8) were tabulated by Hahn and are available 


mm [4]. 


C. NUMERICAL EXAMPLES 
Based upon a random sample of observations from a normal 
distribution whose mean and standard deviation are unknown, 


the following data is obtained. 


pape Ge FG to 49.3 06Cand:) «6551.6 


ai 





From the data, the sample mean X and sample standard 


deviation S are calculated as 


X = (51.4 4+49,5+48.74+49.3+51.6)/5 = 50.10 


and so {sit 5 ome + Giors=son 1)- + Gice 7 -Sore 


(HON 3=50). 1) eC soe) 1/5 


6.9/5 = 1.36 
SS ais 


Then, a two-sided prediction interval to contain a single 
future observation X_,, with 95% probability is (see 
Equation (4,4): 
For n=5, r=0.95, from the Student's t-tables 
t(4,0.975) = 2.776. Substituting the observed values 
ince a 95% prediction interval for Xn+1 2 future 
observation is given by 


(46.527, 53.673) 


Next. a two-sided 95% simultaneous prediction interval to 


contain each of 10 future observations is obtained using 


eavat lone (4.0 ) : 
For k=10, n=5 and r=0.95 from the tables in [4] 


ia ova 1, %s — + 
ial) eee ese UCls es - 50.1 2 Ses (alan ee) 
and the required prediction interval is given by 


Gee oo + La) 
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